General comment with purpose of the script, author, …
library() calls on top
Set default variables and global options
Source additional code
Write the actual code, starting with loading all data files
# This code replicates figure 2 from the# Baldauf et al. 2022 Journal of Ecology paper.# Authors: Selina Baldauf, Jane Doe, Jon Doelibrary(tidyverse)library(vegan)# set defaultsinput_file <-"data/results.csv"# source filessource("R/my_cool_function.R")# read inputinput_data <-read_csv(input_file)
Mark sections
Use comments to break up your file into sections
# Load data ---------------------------------------------------------------input_data <-read_csv(input_file)# Plot data ---------------------------------------------------------------ggplot(input_data, aes(x = x, y = y)) +geom_point()
Insert a section label with Ctrl/Cmd + Shift
Navigate sections in the file outline
Split your workflow
Don’t put all analysis into one long file
Write multiple files that can be called sequentially
Store results as .Rdata if you don’t want always run the script
Write functions that can be called in other scripts
Use the source() function to source these files
Have main workflow files that manage your workflow
Split your workflow
01_prepare-data.R
# Purpose: Calculate daily mean temperature and store in data/# Authors: Selina Baldauflibrary(data.table)# read raw datatemperatures <- data.table::fread("data/huge_temperature_dataset.csv")# calculate daily mean temperaturetemperature_mean <- temperatures[, mean :=mean(temperature), by =c("city", "date")]# save as .Rdata filesave(temperature_mean, file ="data/temperature_mean.Rdata")
02_linear-model-analysis.R
# Purpose: Conduct linear model analysis of daily mean temperature in cities# Authors: Selina Baldaufload(file ="data/temperature_mean.Rdata")# linear modeltemp_lm <-lm(temperature ~ date, data = temperature_mean)
Split your workflow
R/helper01_prepare-data.R
# Purpose: Function to read and prepare data for analysis# Authors: Selina Baldaufprepare_data <-function(path) { the_data <- readr::read_csv(file = path) the_data <- the_data %>%group_by(city) %>%summarize(temperature =mean(temperature))}
analysis/01_linear-model-analysis.R
# Purpose: Conduct linear model analysis of daily mean temperature in cities# Authors: Selina Baldaufinput_file <-"data/huge_temperature_dataset.csv"source("R/helper01_prepare-data.R")# read and prepare the datatemperature <-prepare_data(path = input_file)
Use save paths
To read and write files, you need to tell R where to find them.
Common workflow: set working directory with setwd(), then read files from there. But to this Jenny Bryan said:
If the first line of your R script is setwd("C:\Users\jenny\path\that\only\I\have") I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
Why?
This is 100% not reproducible: Your computer at exactly this time is (probably) the only one in the world that has this working directory
Avoid setwd() if it is possible in any way!
Avoid setwd()
Use R Studio projects
Project root is automatically the working directory
Give your project to a friend at it will work on their machine as well
Instead of
# my unique path from hell with Windows delimiters, white space and special characters setwd("C:\Users\Selina's PC\My Projects\Göttingen Temperatures\temperatures")read_csv("data/2023-04-20_temperature_goettingen.csv")
Define the location of your scripts relative to a project root
here will automatically determine the project root based on this
here will also recognize an .Rproj file as a project root
Build your paths with the here() function relative to the project root
# Check where my here project root ishere::dr_here()# set a path relative to the project root and read filereadr::read_csv(here::here("data/2023-04-20_temperature_goettingen.csv"))
No spaces around parentheses for normal function calls
Spaces before and after () in if or for
Spaces around most operators (<-, ==, +, etc.)
Spaces before pipes (%>%, |>) followed by new line
Spaces before in ggplot + followed by new line
# Goodggplot(aes(x = Sepal.Width, y = Sepal.Length, color = Species)) +geom_point()# Badggplot(aes(x = Sepal.Width, y = Sepal.Length, color = Species))+geom_point()
Coding style - Line width
Try to limit your line width to 80 characters.
You don’t want to scroll to the right to read all code
80 characters can be displayed on most displays and programs
Split your code into multiple lines if it is too long
Luckily, no! R and R Studio provide some nice helpers
Coding style helpers - R Studio
R Studio has style diagnostics that tells you where something is wrong
Tools -> Gloabl Options -> Code -> Diagnostics
Coding style helpers - R Studio
You can show a margin line to help you keep 80 characters line width
Tools -> Gloabl Options -> Code -> Display
Coding style helpers - {lintr}
The lintr package analyses your code files or entire project and tells you what to fix.
# install the package before you can use itinstall.packages("lintr")# lint specific filelintr::lint(filename ="analysis/01_prepare_data.R")# lint a directory (by default the whole project)lintr::lint_dir()
Coding style helpers - {lintr}
Coding style helpers - {styler}
The styler package package automatically styles your files and projects according to the tidyverse style guide.
# install from CRANinstall.packages("styler")
Use the R Studio Addins for styler:
Coding style helpers - {styler}
Pro-Tip: Add a custom keyboard short cut to style your files
Tools -> Modify Keyboard Shortcuts
Manage dependencies with {renv}
Idea: Have a project-local environment with all packages needed by the project
Keep log of the packages and versions you use
Restore the local project library on other machines
Why this is useful?
Code will still work even if packages upgrade
Collaborators can recreate your local project library with one function
Very simple to use and integrate into your project workflow:
# Step 1: initialize a project level R libraryrenv::init()# Step 2: save the current status of your library to a lock filerenv::snapshot()# Step 3: restore state of your project from renv.lockrenv::restore()
Your collaborators only need to install the renv package, then they can also call renv::restore()
When you create an R Studio project there is a check mark to initialize with renv
Summary
Take aways
There are a lot of things that require minimal effort and that you can start to implement into your workflow NOW
A research compendium is a collection of all the digital parts of your research projects (data, code, documents). R packages have a similar structure and therefore can be used to publish a fully reproducible version of your project.